智能论文笔记

Neural-Rendezvous: Learning-based Robust Guidance and Control to Encounter Interstellar Objects

Hiroyasu Tsukamoto , Soon-Jo Chung , Benjamin Donitz , Michel Ingham , Declan Mages , Yashwanth Kumar Nakka

分类：机器人 | 人工智能 | 机器学习

2022-08-09

星际对象（ISO），与太阳相结合的无重力的天文对象，可能是原始材料的代表，在理解系外星系中无价。然而，由于其倾斜度通常很高和相对速度的限制性较差，因此，使用常规的人类在循环方法中探索ISO非常具有挑战性。本文介绍了神经汇聚 - 一个基于深度学习的指导和控制框架，用于遇到任何快速移动的对象，包括ISO，稳健，准确和实时自主。它在指导策略之上使用最小规范跟踪控制，该指南策略由频谱归一化的深神经网络建模，在该策略策略中，其超级参数通过新引入的损耗函数调节，直接惩罚了状态轨迹跟踪错误。我们严格地表明，即使在ISO探索的挑战性案例中，神经汇聚也提供了1）在预期的航天器递送误差上的高概率指数构成； 2）关于模型预测控制的解决方案的有限最优差距，这两者都是必不可少的，尤其是对于如此关键的空间任务。在数值模拟中，证明神经汇聚可以达到99％具有现实状态不确定性的ISO候选者的终末交付误差小于0.2 km，同时保留足以实现实时实施的计算效率。

translated by 谷歌翻译

Learning-based methods to model small body gravity fields for proximity operations: Safety and Robustness

Daniel Neamati , Yashwanth Kumar Nakka , Soon-Jo Chung

分类：机器人 | 机器学习

2021-12-18

准确的重力场模型对于小型身体周围的安全邻近操作至关重要。最先进的技术使用球形谐波或高保真多面体形状模型。遗憾的是，这些技术可以在小体的表面附近变得不准确，或者具有高的计算成本，特别是对于二元或异质的小体。新的基于学习的技术不编码预定义结构并且更通用。为了换取多功能性，基于学习的技术可以在训练数据域中的外部较低。在部署中，航天器轨迹是动态数据的主要来源。因此，培训数据域应包括航天器轨迹，以准确评估学习模型的安全性和鲁棒性。我们开发了一种新的基于学习的重力模型的方法，可直接使用宇宙飞船的过去的轨迹。我们进一步介绍了一种方法来通过比较培训域内和外部的准确性来评估基于学习的技术的安全性和鲁棒性。我们展示了两个基于学习的框架的这种安全性和鲁棒性方法：高斯过程和神经网络。随着所提供的详细分析，我们在用于接近操作时，我们经验证明需要对学习的重力模型的稳健性验证。

translated by 谷歌翻译

Contraction Theory for Nonlinear Stability Analysis and Learning-based Control: A Tutorial Overview

Hiroyasu Tsukamoto , Soon-Jo Chung , Jean-Jacques E. Slotine

分类：机器学习 | 机器人

2021-10-01

收缩理论是一种分析工具，用于研究以均匀的正面矩阵定义的收缩度量下的非自主（即，时变）非线性系统的差动动力学，其存在导致增量指数的必要和充分表征多种溶液轨迹彼此相互稳定性的稳定性。通过使用平方差分长度作为Lyapunov样功能，其非线性稳定性分析向下沸腾以找到满足以表达为线性矩阵不等式的稳定条件的合适的收缩度量，表明可以在众所周知的线性系统之间绘制许多平行线非线性系统理论与收缩理论。此外，收缩理论利用了与比较引理结合使用的指数稳定性的优越稳健性。这产生了基于神经网络的控制和估计方案的急需安全性和稳定性保证，而不借助使用均匀渐近稳定性的更涉及的输入到状态稳定性方法。这种独特的特征允许通过凸优化来系统构造收缩度量，从而获得了由于扰动和学习误差而在外部扰动的时变的目标轨迹和解决方案轨迹之间的距离上的明确指数界限。因此，本文的目的是介绍了收缩理论的课程概述及其在确定性和随机系统的非线性稳定性分析中的优点，重点导出了各种基于学习和数据驱动的自动控制方法的正式鲁棒性和稳定性保证。特别是，我们提供了使用深神经网络寻找收缩指标和相关控制和估计法的技术的详细审查。

translated by 谷歌翻译

Data Valuation Without Training of a Model

Nohyun Ki , Hoyong Choi , Hye Won Chung

分类：机器学习

2023-01-03

Many recent works on understanding deep learning try to quantify how much individual data instances influence the optimization and generalization of a model, either by analyzing the behavior of the model during training or by measuring the performance gap of the model when the instance is removed from the dataset. Such approaches reveal characteristics and importance of individual instances, which may provide useful information in diagnosing and improving deep learning. However, most of the existing works on data valuation require actual training of a model, which often demands high-computational cost. In this paper, we provide a training-free data valuation score, called complexity-gap score, which is a data-centric score to quantify the influence of individual instances in generalization of two-layer overparameterized neural networks. The proposed score can quantify irregularity of the instances and measure how much each data instance contributes in the total movement of the network parameters during training. We theoretically analyze and empirically demonstrate the effectiveness of the complexity-gap score in finding 'irregular or mislabeled' data instances, and also provide applications of the score in analyzing datasets and diagnosing training dynamics.

translated by 谷歌翻译

X-MAS: Extremely Large-Scale Multi-Modal Sensor Dataset for Outdoor Surveillance in Real Environments

DongKi Noh , Changki Sung , Teayoung Uhm , WooJu Lee , Hyungtae Lim , Jaeseok Choi , Kyuewang Lee , Dasol Hong , Daeho Um , Inseop Chung

分类：机器人

2022-12-30

In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.

translated by 谷歌翻译

Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing

Hyeonsu Jeong , Hye Won Chung

分类：机器学习 | (统计)机器学习

2022-12-29

Crowdsourcing has emerged as an effective platform to label a large volume of data in a cost- and time-efficient manner. Most previous works have focused on designing an efficient algorithm to recover only the ground-truth labels of the data. In this paper, we consider multi-choice crowdsourced labeling with the goal of recovering not only the ground truth but also the most confusing answer and the confusion probability. The most confusing answer provides useful information about the task by revealing the most plausible answer other than the ground truth and how plausible it is. To theoretically analyze such scenarios, we propose a model where there are top-two plausible answers for each task, distinguished from the rest of choices. Task difficulty is quantified by the confusion probability between the top two, and worker reliability is quantified by the probability of giving an answer among the top two. Under this model, we propose a two-stage inference algorithm to infer the top-two answers as well as the confusion probability. We show that our algorithm achieves the minimax optimal convergence rate. We conduct both synthetic and real-data experiments and demonstrate that our algorithm outperforms other recent algorithms. We also show the applicability of our algorithms in inferring the difficulty of tasks and training neural networks with the soft labels composed of the top-two most plausible classes.

translated by 谷歌翻译

Large Language Models Encode Clinical Knowledge

Karan Singhal , Shekoofeh Azizi , Tao Tu , S. Sara Mahdavi , Jason Wei , Hyung Won Chung , Nathan Scales , Ajay Tanwani , Heather Cole-Lewis , Stephen Pfohl

分类：自然语言处理

2022-12-26

Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.

translated by 谷歌翻译

Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization

Daesung Kim , Hye Won Chung

分类： (统计)机器学习 | 机器学习

2022-12-19

The nonconvex formulation of matrix completion problem has received significant attention in recent years due to its affordable complexity compared to the convex formulation. Gradient descent (GD) is the simplest yet efficient baseline algorithm for solving nonconvex optimization problems. The success of GD has been witnessed in many different problems in both theory and practice when it is combined with random initialization. However, previous works on matrix completion require either careful initialization or regularizers to prove the convergence of GD. In this work, we study the rank-1 symmetric matrix completion and prove that GD converges to the ground truth when small random initialization is used. We show that in logarithmic amount of iterations, the trajectory enters the region where local convergence occurs. We provide an upper bound on the initialization size that is sufficient to guarantee the convergence and show that a larger initialization can be used as more samples are available. We observe that implicit regularization effect of GD plays a critical role in the analysis, and for the entire trajectory, it prevents each entry from becoming much larger than the others.

translated by 谷歌翻译

MEIL-NeRF: Memory-Efficient Incremental Learning of Neural Radiance Fields

Jaeyoung Chung , Kanggeon Lee , Sungyong Baik , Kyoung Mu Lee

分类：计算机视觉

2022-12-16

Hinged on the representation power of neural networks, neural radiance fields (NeRF) have recently emerged as one of the promising and widely applicable methods for 3D object and scene representation. However, NeRF faces challenges in practical applications, such as large-scale scenes and edge devices with a limited amount of memory, where data needs to be processed sequentially. Under such incremental learning scenarios, neural networks are known to suffer catastrophic forgetting: easily forgetting previously seen data after training with new data. We observe that previous incremental learning algorithms are limited by either low performance or memory scalability issues. As such, we develop a Memory-Efficient Incremental Learning algorithm for NeRF (MEIL-NeRF). MEIL-NeRF takes inspiration from NeRF itself in that a neural network can serve as a memory that provides the pixel RGB values, given rays as queries. Upon the motivation, our framework learns which rays to query NeRF to extract previous pixel values. The extracted pixel values are then used to train NeRF in a self-distillation manner to prevent catastrophic forgetting. As a result, MEIL-NeRF demonstrates constant memory consumption and competitive performance.

translated by 谷歌翻译

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

Hirofumi Inaguma , Sravya Popuri , Ilia Kulikov , Peng-Jen Chen , Changhan Wang , Yu-An Chung , Yun Tang , Ann Lee , Shinji Watanabe , Juan Pino

分类：自然语言处理

2022-12-15

Direct speech-to-speech translation (S2ST), in which all components can be optimized jointly, is advantageous over cascaded approaches to achieve fast inference with a simplified pipeline. We present a novel two-pass direct S2ST architecture, {\textit UnitY}, which first generates textual representations and predicts discrete acoustic units subsequently. We enhance the model performance by subword prediction in the first-pass decoder, advanced two-pass decoder architecture design and search strategy, and better training regularization. To leverage large amounts of unlabeled text data, we pre-train the first-pass text decoder based on the self-supervised denoising auto-encoding task. Experimental evaluations on benchmark datasets at various data scales demonstrate that UnitY outperforms a single-pass speech-to-unit translation model by 2.5-4.2 ASR-BLEU with 2.83x decoding speed-up. We show that the proposed methods boost the performance even when predicting spectrogram in the second pass. However, predicting discrete units achieves 2.51x decoding speed-up compared to that case.

translated by 谷歌翻译